Skip to content

fix: file size mismatch when download interrupted#176

Merged
rtuszik merged 5 commits intortuszik:mainfrom
sEpt0r:fix-downloader
Oct 31, 2025
Merged

fix: file size mismatch when download interrupted#176
rtuszik merged 5 commits intortuszik:mainfrom
sEpt0r:fix-downloader

Conversation

@sEpt0r
Copy link
Copy Markdown
Contributor

@sEpt0r sEpt0r commented Oct 30, 2025

Problem

When downloading large files (e.g., a 117GB photon-db), connection failures cause the entire download to restart from scratch instead of resuming, because the download state is saved only every 1MB.

Why it happens:

When the connection drops mid-download, the file contains more data than the last saved state, and on retry, the exact size check fails.

  1. Download state is saved every 1MB (save_interval = 1024 * 1024)
  2. When connection drops mid-download, the file contains more data than the last saved state
  3. On retry, the exact size check fails
  4. Script logs "File size mismatch, starting fresh download" and deletes state
  5. Download restarts from 0 bytes instead of resuming

Fix

This fix ensures that progress is saved even if data written between periodic saves (every 1MB), preventing loss of up to 1MB of progress.

Additionally, increases connection timeout (30s) and read timeout (60s), and adds a backoff timeout to make downloads more reliable on slow or unstable networks.

Test

I built a new container and ran tests in my slowest setup.
After a network interruption, the Python script continued downloading the file:

2025-10-30 22:09:53,250 - root - INFO - Download progress: 6.8% (7.93GB / 116.98GB) - 47.3 Mbps - ETA: 5h 30m
2025-10-30 22:10:03,250 - root - INFO - Download progress: 6.8% (7.98GB / 116.98GB) - 48.2 Mbps - ETA: 5h 23m
2025-10-30 22:10:13,251 - root - INFO - Download progress: 6.9% (8.04GB / 116.98GB) - 47.5 Mbps - ETA: 5h 28m
2025-10-30 22:11:16,951 - root - WARNING - Download attempt 1 failed: HTTPSConnectionPool(host='r2.koalasec.org', port=443): Read timed out.
2025-10-30 22:11:16,954 - root - INFO - Waiting 1s before retry...
2025-10-30 22:11:17,956 - root - INFO - Retrying download (attempt 2/3)...
2025-10-30 22:11:17,960 - root - INFO - Resuming download: file size 8654962688 bytes (saved state: 8654962688 bytes)
2025-10-30 22:11:17,961 - root - INFO - Resuming download from byte 8654962688
2025-10-30 22:11:18,851 - root - INFO - Starting download of 116.98GB to photon-db-latest.tar.bz2
2025-10-30 22:11:28,854 - root - INFO - Download progress: 6.9% (8.11GB / 116.98GB) - 38.5 Mbps - ETA: 6h 44m
2025-10-30 22:11:38,856 - root - INFO - Download progress: 7.0% (8.16GB / 116.98GB) - 47.0 Mbps - ETA: 5h 31m
2025-10-30 22:11:48,859 - root - INFO - Download progress: 7.0% (8.22GB / 116.98GB) - 47.2 Mbps - ETA: 5h 29m

It may still be worth increasing the read timeout further (e.g., from timeout=(30, 60) to timeout=(30, 300)), but at least the download now continues instead of restarting from the beginning.

Fixes #169

chore: replace workflow repo tags by variables
When downloading large files (e.g., 117GB photon-db), connection
failures cause the entire download to restart from scratch instead of
resuming, because download state is saved every 1MB.

When connection drops mid-download, the file has MORE data than the last
saved state and on retry, the exact size check fails.

This fix ensures progress is saved even if data written between periodic
saves (every 1MB), preventing loss of up to 1MB of progress.

Also, increased connection timeout (30s) and read timeout (60s), and
added backoff timeout to be more reliable in the slow and unstable
networs.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 30, 2025

📝 Walkthrough

Walkthrough

Updates to the downloader module introduce more robust download handling with improved resume logic, error handling with state persistence, extended HTTP timeouts with differentiated connect and read settings, and exponential backoff retry logic on request failures.

Changes

Cohort / File(s) Change Summary
Download robustness enhancements
src/downloader.py
Enhanced resume logic in load_download_state to treat resumed downloads as valid when actual file size matches or exceeds saved size, updating state accordingly; wrapped _download_content loop with try/except to persist state on both completion and exception before re-raising errors; configured _perform_download HTTP timeout from single value to tuple (30, 60) seconds for separate connect and read timeouts; implemented exponential backoff (1s, 2s, 4s) retry policy in download_file on RequestException with logging at each attempt; ensured save_download_state is called after successful downloads.

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title Check ✅ Passed The title "fix: file size mismatch when download interrupted" is concise and specific, clearly summarising the primary change which directly addresses the main issue identified in the pull request. It accurately reflects that the PR tackles the core problem where download interruptions caused incorrect file size checks, leading to full restart rather than resumption. The title provides enough clarity for a developer scanning the history to understand the key improvement without being vague or overly broad.
Linked Issues Check ✅ Passed The code changes directly address all primary objectives from issue #169. The resume logic enhancement addresses the file size mismatch problem by comparing actual file size against saved state rather than requiring exact equality, preventing unnecessary fresh downloads. The timeout extension from 20 seconds to a (30, 60) tuple addresses the read timeout failures on mirrors. The exponential backoff implementation (1s, 2s, 4s delays) and improved retry flow tackle the challenge of exceeding maximum attempts. State persistence after each download and on exceptions ensures progress is not lost between saves, collectively enabling reliable download and extract of large files as requested in the linked issue.
Out of Scope Changes Check ✅ Passed All changes in the pull request are directly scoped to the stated objectives from issue #169. The modifications to the downloader logic—including resume logic, error handling, state persistence, timeout configuration, and retry backoff—are all specifically aimed at fixing the download interruption issues and file size mismatch problems. The raw summary confirms no alterations to exported or public entities and no unrelated changes have been introduced. Every change serves a clear purpose in improving download reliability on slow or unstable networks.
Description Check ✅ Passed The pull request description is well-structured and directly related to the changeset, detailing the problem (file size mismatch causing restart), the root cause (state saved only every 1MB), and the solution (improved state persistence, extended timeouts, and exponential backoff retry logic). The description also includes concrete test evidence from actual testing on a slow setup, demonstrating that the fix works in practice with log excerpts showing successful resumption after network interruption. This level of detail and relevance clearly satisfies the lenient pass criterion for this check.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between fd434ee and 39771e8.

📒 Files selected for processing (1)
  • src/downloader.py (4 hunks)
🔇 Additional comments (4)
src/downloader.py (4)

116-123: Graceful resume when the file is ahead of the saved state

Allowing the resume path whenever the on-disk file has at least as many bytes as the recorded state keeps the safety net against truncation while eliminating the old restart penalty. The extra logging makes the decision path clear in logs too.


397-432: Solid guard that always persists progress

Persisting the state after every loop—in both the clean exit path and the exception path—seals the window where we could lose up to 1 MB. This is exactly what was biting the large downloads.


459-459: Timeout split matches real-world mirror latency

Moving to (30, 60) keeps the connect phase snappy while letting slow mirrors stream without tripping the read timeout. Sensible choice.


514-516: Back-off is a good fit for flaky networks

The exponential wait before retries gives mirrors a breather without making the user wait forever. Nice incremental resilience win.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Owner

@rtuszik rtuszik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rtuszik
Copy link
Copy Markdown
Owner

rtuszik commented Oct 31, 2025

Thanks for the fix!

@rtuszik rtuszik changed the base branch from main to dev October 31, 2025 17:18
@rtuszik rtuszik changed the base branch from dev to main October 31, 2025 17:35
@rtuszik rtuszik merged commit 8bd31a5 into rtuszik:main Oct 31, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Unable to download planet, max. 3 download attempts, read timed outs and file size mismatches

2 participants